Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networks Approaches
نویسندگان
چکیده
In this paper, we study the sensitivity of CNN outputs with respect to image transformations and noise in the area of fine-grained recognition. In particular, we answer the following questions (1) how sensitive are CNNs with respect to image transformations encountered during wild image capture?; (2) can we increase the robustness of CNNs with respect to image degradations? and (3) how can we predict CNN sensitivity? To answer the first question, we provide an extensive empirical sensitivity analysis of commonly used CNN architectures (AlexNet, VGG19, GoogleNet) across various of types of image degradations. This allows for predicting CNN performance for new domains comprised by images of lower quality or captured from a different viewpoint. We also show how the sensitivity of CNN outputs can be predicted for single images. Furthermore, we demonstrate that input layer dropout or pre-filtering during test time only reduces CNN sensitivity for high levels of degradation. How sensitive are CNN approaches? We analyze the sensitivity of three state-of-the-art CNN architectures, which are widely used in recent works: AlexNet [1], VGG19 [2] and GoogLeNet [3]. We perturb test images of different datasets with common noise types (Figure 2) including Gaussian and pepper noise as well as random color shifts and image transformations like random translations, rotations and flips. Our experiments show the weaknesses of a network that is trained on images which contain almost no noise. This is particularly important in real-world applications, where either low budget cameras are used while the training images are noise free or the lighting conditions changed after training. Can we make CNNs more robust? Our experiments show that even small random noise can lead to a dramatic performance decrease. Now the question naturally arises whether it is possible to increase the robustness either during testing or by adapting the learning. In our paper, we analyze two intuitive and simple ideas for increasing robustness, namely data augmentation by applying input dropout to the training data and image pre-processing using a Gaussian or morphological filter. Can we predict CNN sensitivity for a test image? After the empirical analysis, the question remains whether we can quickly detect images with unstable CNN outputs. This question goes beyond a pure sensitivity study but asks for uncertainty estimates often available for Bayesian methods but not for CNNs. We present an approach (Figure 1) for estimating the sensitivity given an input using a first-order approximation of the output change. Our approach is generic enough to cover many different types of noise. At the same time, this approximation allows us to calculate the sensitivity using the back-propagation algorithm, which is used for training CNN models and hence is already available in most frameworks. Summary The experiments show that the influence especially of common intensity noise is severe even at low noise levels. The reason is a domain shift between noise-free training and pertubated test data. From our study, we can draw several conclusions: 1. The training images should have the same noise level as the test images and care has to be taken even for small noise applied to intensities. 2. Data augmentation during training is not the solution as it decreases the accuracy on noise-free images dramatically and is only beneficial for high noise levels as shown. 3. Noise sensitivity depends on the CNN architecture and VGG19 has shown to be the most robust one. 4. Sensitivity of CNN outputs can be predicted for small noise levels with our technique allowing for uncertainty estimates of CNN outputs. Increasing noise level
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملA hybrid EEG-based emotion recognition approach using Wavelet Convolutional Neural Networks (WCNN) and support vector machine
Nowadays, deep learning and convolutional neural networks (CNNs) have become widespread tools in many biomedical engineering studies. CNN is an end-to-end tool which makes processing procedure integrated, but in some situations, this processing tool requires to be fused with machine learning methods to be more accurate. In this paper, a hybrid approach based on deep features extracted from Wave...
متن کاملEstimation of Hand Skeletal Postures by Using Deep Convolutional Neural Networks
Hand posture estimation attracts researchers because of its many applications. Hand posture recognition systems simulate the hand postures by using mathematical algorithms. Convolutional neural networks have provided the best results in the hand posture recognition so far. In this paper, we propose a new method to estimate the hand skeletal posture by using deep convolutional neural networks. T...
متن کاملWeakly-supervised Discriminative Patch Learning via CNN for Fine-grained Recognition
Research on fine-grained recognition has recently shifted from multistage frameworks to convolutional neural networks (CNN) that are trained end-to-end. Many previous end-to-end deep approaches typically consist of a recognition network and an auxiliary localization network trained with additional part annotations to detect semantic parts shared across classes. To avoid the cost of extra semant...
متن کاملFine-grained Recognition Datasets for Biodiversity Analysis
In the following paper, we present and discuss challenging applications for fine-grained visual classification (FGVC): biodiversity and species analysis. We not only give details about two challenging new datasets suitable for computer vision research with up to 675 highly similar classes, but also present first results with localized features using convolutional neural networks (CNN). We concl...
متن کاملHand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study
Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1610.06756 شماره
صفحات -
تاریخ انتشار 2016